A Search Task Dataset for German Textual Entailment

نویسندگان

  • Britta D. Zeller
  • Sebastian Padó
چکیده

We present the first freely available large German dataset for Textual Entailment (TE). Our dataset builds on posts from German online forums concerned with computer problems and models the task of identifying relevant posts for user queries (i.e., descriptions of their computer problems) through TE. We use a sequence of crowdsourcing tasks to create realistic problem descriptions through summarisation and paraphrasing of forum posts. The dataset is represented in RTE-5 Search task style and consists of 172 positive and over 2800 negative pairs. We analyse the properties of the created dataset and evaluate its difficulty by applying two TE algorithms and comparing the results with results on the English RTE-5 Search task. The results show that our dataset is roughly comparable to the RTE-5 data in terms of both difficulty and balancing of positive and negative entailment pairs. Our approach to create task-specific TE datasets can be transferred to other domains and languages.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Evaluating Compound Splitters Extrinsically with Textual Entailment

Traditionally, compound splitters are evaluated intrinsically on gold-standard data or extrinsically on the task of statistical machine translation. We explore a novel way for the extrinsic evaluation of compound splitters, namely recognizing textual entailment. Compound splitting has great potential for this novel task that is both transparent and well-defined. Moreover, we show that it addres...

متن کامل

Natural Language Inference from Multiple Premises

We define a novel textual entailment task that requires inference over multiple premise sentences. We present a new dataset for this task that minimizes trivial lexical inferences, emphasizes knowledge of everyday events, and presents a more challenging setting for textual entailment. We evaluate several strong neural baselines and analyze how the multiple premise task differs from standard tex...

متن کامل

Bar Ilan University Applied Textual Entailment

This thesis introduces the applied notion of textual entailment as a generic empirical task that captures major semantic inferences across many applications. Textual entailment addresses semantic inference as a direct mapping between language expressions and abstracts the common semantic inferences as needed for text based Natural Language Processing applications. We define the task and describ...

متن کامل

ALTN: Word Alignment Features for Cross-lingual Textual Entailment

We present a supervised learning approach to cross-lingual textual entailment that explores statistical word alignment models to predict entailment relations between sentences written in different languages. Our approach is language independent, and was used to participate in the CLTE task (Task#8) organized within Semeval 2013 (Negri et al., 2013). The four runs submitted, one for each languag...

متن کامل

Divide and Conquer: Crowdsourcing the Creation of Cross-Lingual Textual Entailment Corpora

We address the creation of cross-lingual textual entailment corpora by means of crowdsourcing. Our goal is to define a cheap and replicable data collection methodology that minimizes the manual work done by expert annotators, without resorting to preprocessing tools or already annotated monolingual datasets. In line with recent works emphasizing the need of large-scale annotation efforts for te...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013